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We present a new method for reducing the stochastic noise of all-to-all propagators based on 
stopping the inversion of the propagator before convergence. The method is easy to implement, 
unbiased and independent of the quark action. Applying this method to the calculation of dis- 
connected loops needed for hadronic structure observables we find savings in computer time of 
factors of 4 — 12 depending on the operator inserted in the loop. When combined with a hop- 
ping parameter expansion technique we obtain combined gains of up to factors of 30 for some 
operators. 
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1. Introduction 

Nucleon structure observables such as baryon form factors and moments of (generalised) par- 
ton distributions are extracted from 3pt functions which have connected and disconnected contri- 
butions. The latter, of the form Tr(rM _1 ) x2pt function, are normally omitted as they require the 
calculation of all-to-all propagators Mj^J . Instead often differences between observables, for exam- 
ple gA = Au — Ad, are quoted. However, any settling of the question of the spin or the strangeness 
content of the nucleon requires a calculation of the corresponding disconnected loops. In the fol- 
lowing we present a new method for calculating all-to-all propagators which reduces the associated 
stochastic noise and should make such calculations more viable. This is a general method which 
can be applied to all cases where all-to-all propagators are needed. 

1.1 Stochastic methods for all-to-all propagators 

The standard method for computing all-to-all propagators is via stochastic sampling. A set of 
random complex Z(2) noise vectors, |t]'), I = 1 . . .L, is generated for which, 

i-£|T 7 ')(T J '|=l + 0(l/v^), 7LlV) = 0(l/VL). (1.1) 

Using these vectors as sources one can construct an unbiased estimate of the all-to-all propagator, 
El(M _1 ), using |tj ) and the corresponding solution vectors \s'} = M |tj ): 

E L (M- 1 ) = l£|,')(T 7 / | =M- 1 +M-Ml£|T 7 / )(r 7 / |-lj. (1.2) 



From eqns. [O] and 1.2 it is clear that the stochastic error on the estimate only depends on the off- 



diagonal elements of jYi In'Xn'l and falls off as 0(1/a/L). For a fixed number of configurations, 
depending on the quantity studied (obtained using E/^M -1 )), the stochastic noise can dominate 
over the gauge noise and additional noise reduction techniques are required. 

It is important to note that any noise reduction techniques should be unbiased and the re- 
sulting reduction in noise should justify the computational overhead. Existing techniques include 
"partitioning" |jl]] where each noise vector \r} 1 ) is replaced by a set of partitioned vectors \r] 



p = I...P, where \ f]')p has many zeros. By zero-ing entries in the source vector one will avoid 



some of the large off-diagonal elements contributing in eqn. |L2|; the hope is that a smaller variance 
is obtained for the same amount of computer time despite P times as many inversions. Wilcox [p 
found the gain (in terms of computer time) for Tr(rM _1 ) for colour-spin partitioning to depend 
strongly on Y but could be in the region of factors of 3 — 7 (for T = y^ v and YnYs) or higher (for y 5 ). 
The Kentucky group ^ take a different approach and use the hopping parameter expan- 



sion (HPE), to construct traceless estimates of the off-diagonal elements in eqn. \l.2\ Subtracting 
these estimates from Tr(rM _1 ), where M = 1 — Kp, leaves the trace unchanged but reduces the 
variance. This approach should work well in the heavy quark regime, for example for masses down 
to the strange quark mass. Mathur and Dong ^ found a gain of a factor of 6 — 7 subtracting 
up to K 4 1ft 4 for the strangeness contribution to the magnetic moment of the nucleon G S M (0). The 
computational overhead of performing the subtraction was not significant. 



2 



Disconnected contributions to hadronic structure 



Sara Collins 



Additional approaches also exist: for example in the light quark regime one can calculate the 
low lying eigenmodes of the Dirac operator and use these to estimate part of the propagator [Q]. 
The remainder can be calculated stochastically [||]. Different methods can often be combined. 

1.2 A new approach: unbiased truncation of the solver 

We present a new method for noise reduction which involves stopping the inversion of the 
stochastic propagator before convergence, i.e. using n t iterations in the solver to obtain \J n ) = 
M^T]') instead of running to convergence using n c iterations and obtaining |j* ) = M^^T]'). 
The difference between M,^ 1 and M" 1 can be estimated stochastically using an independent set of 
sources: 

EIM,; 1 ] = ElJM^I+E^M^-M- 1 ]. (1.3) 

This is based on an exact linear decomposition and the algorithm with which both parts are calcu- 
lated is well defined. Using two independent sets of noise vectors for the two parts then implies 
an unbiased estimate of M~ 1 . If the inverter converges rapidly significant gains in computer time 
can be obtained. Rapid convergence means that M^ 1 is very close to M,^ 1 even for small n t <^n c . 
Hence, the stochastic error can be reduced by performing a large number of cheap inversions for 
EijMr ], L\ S> L2, and only a small number, L 2 , of expensive inversions to calculate the small 
correction. 

To check this method we compared the exact result for (M -1 )^ 1,l2e2 , where s\c\ denotes the 
spin and color indices, x = (0, 0, 0, 3) and y = (z, 0,0, 3), i = . . . 10, with an estimate obtained from 



eqn. 1.3. As expected we find consistency within errors for different n t , L\ and L 2 . For example, for 
n t = 5, L x = 5500, L 2 = 300, i = l,sl=s2= 1, c\=cl = 2, EJM" 1 ] = (0.0300(7), -0.0014(7)) 
compared to the exact result of (0.0302 . . . , -0.0010 . . .). 

We now have two parameters, n t and the ratio Z4/L2, which need to be fixed, ideally, so as 
to minimize the variance of the disconnected loop, Tr(FM~ ), at fixed cost. For L\,L2^> 1 the 
variance (Var) is given by 

Var Ll [Tr(rM n ; 1 )]+Var L2 [Tr(r(M- r 1 -M n - 1 ))] = f + (1.4) 

where /j and f 2 depend on n t and Y, while the approximate cost is given by 

C = L\n t +L2n c . (1.5) 

Using Lagrange multipliers and assuming f\ to be approximately independent of n t we obtain the 
optimal values 



opt 



1 hh U fx 



(1.6) 



where f 2 = df2/dn t . For our observables we find that using these optimal values leads to « 

Additional gain can be obtained by combining with other noise reduction techniques. Here we 
consider the HPE approach [||, ||]. The expansion of E[Tr(rM^ 1 )] to order m is given by: 

EfTr^-. 1 )] = lE[(i I , |r|V> + (V|r^|V) + ... + <T I i |ri C *^»|V 



+E[Tr(rfC m+ V n+1 M- c 1 )], (1.7) 
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where, since this is a geometric series, the last term gives the remainder, Y^= m +\ {t] 1 \Yk p p p \r] 1 ), 
averaged over stochastic sources. One can omit terms in the expansion which only contribute to 
the noise. All odd terms, Tv(Tp 2m+l ) = 0, V T. For the even terms, Tr(T) = V T / 1, while 1 
Tr(r$> 2 ) = V T and for T = 7,75 and y 5 , even Tr(T$> 4 ) = Tr(T$> 6 ) = 0. Hence, for T = 7,75 and 
75, since all terms up to 8th order only contribute to the noise, an improved estimate of the trace is 
given by E[Tr(rfC 8 ^) 8 M -1 )]. For all other 7 combinations we use 2 E[Tr(rfC 4 p A M~ 1 )]. The 4th 
and 6th order terms can be calculated explicitly [0] to achieve the same level of improvement as for 
r = 7/i 75 and 75 , however, we have not done so in this study. To combine with our truncated solver 
method we substitute, for example, K^p^MT 1 for M _1 in eqn. 1.3. 



2. Results 

We have performed an exploratory study of our method using configurations provided by the 
Wuppertal group: these are tif = 2+ 1 dynamical configurations generated using a Symanzik im- 
proved gauge action and a stout-link improved staggered fermion action. The lattice spacing is 
fairly coarse, a -1 « 1.55 GeV while the volume is around 2 fm. Further details can be found in 
For valence quarks we used the Wilson action with K = 0.166, 0.1675 and 0.1684 corresponding to 
pseudoscalar masses of about 600, 450 and 300 MeV respectively. Our main results were obtained 
using the conjugate gradient algorithm with even-odd preconditioning to perform the propagator 



inversions. However, section g3| will show results obtained using the stabilised biconjugate gra- 
dient algorithm (BiCGStab). The code used throughout was a modified version of the Chroma 
code 

Results are presented below for the disconnected loop, Tr(rM _1 ), where we have considered 
r = 1, 7m, 7^75, o^ v , 75- Using M _1 = 75(M~ 1 ) 1 ~7 5 one can show that the trace is either real or 
imaginary 3 . At this initial stage we are only interested in the stochastic error and, hence, the results 
are presented for the trace on a single configuration. In addition to combining our method with the 
HPE approach we also partition in time: |t]') are only non-zero for t = 3. 

2.1 Truncating the solver 

The truncated solver method (TSM) relies on Tr(rM~ 1 ) coming close to the convergent value 
after only a few iterations of the inverter. We found this to be true for all Ts studied and the 
example of T = 1 is shown in figure [j] Clearly the trace is close to the limiting value after 20 
iterations (compared to the 480 iterations needed for convergence). Proceeding to the calculation 
of the optimal values for n t and L\/Li, we use Tr^M^ 1 ) and Trp^M^ 1 — M^ 1 )] calculated on a 
single set of stochastic estimates, L = 300, for n t = 2 to 100 in steps of 2 iterations to estimate f\, 
fi and f 2 as functions of n t . Using eqn. L6 we obtain the optimal values given in table []]. The 



results are presented for a subset of Ts and show that all values for n° pt are small, but also that n° t pt 
and Z4/L2 depend on the Y used. 

No error analysis has been attempted for these values and they should be considered rough 
estimates. However, we have increased the number of stochastic estimates to 500 and no significant 



'These terms are zero for the Wilson action. For the clover action only the m = term can be omitted. 

2 Where for F — 1 we construct the non-vanishing Tr 1. 

3 Of course the path integral expectation value (TrriVP 1 ) = V T 7^ 1. 
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Figure 1: The disconnected loop for r = 1 as a function of the number of iterations used in the inverter 
for M for K = 0.166. The loop is shown for (left) L = 1 where the horizontal line shows the value at 
convergence ( n c = 480) and (right), with errors, L = 300. 
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Table 1: Optimal values for n t and Z4/L2 for a subset of the Ts studied, calculated using L = 300 for 
K = 0.166. The gains obtained for the estimate of TrfTM -1 ) using these optimal values at fixed cost are 
also shown. Where our method is combined with the HPE technique, m indicates the order used. 



change in the results was found. Using n° pt and Z4/L2 we can calculate the gain in computer time 
using the TSM at fixed cost this time with L\ and L2 independent stochastic sources. The cost, 
to be inserted in eqn. 1.5, is set by generating 300 stochastic estimates of Tr(rM _1 ). The gain 
corresponds to 

VarfT^nVT 1 )] 



Gain 



(2-1) 



Var^tTM-^pSM] 

Table [l] shows the TSM to result in significant gains for all Ts studied, including T = 1. Note that, 
if time partitioning is not used in nominator and denominator these numbers are likely to be much 
larger. 

We expect further variance reductions to be achieved when combining our method with the 
HPE technique discussed in section L2. Figure § shows the disconnected loop for K = 0.166, 
which corresponds to about 20% below the strange quark mass, for T = 1 and 7375. We see that for 
r = 1 the variance does not reduce significantly as kT/) is applied up to the limit of m = 4. This is 
also the case for T = 74. However, for all other Ts significant reductions in the variance are found, 
as seen for 7375. 

Once combined with the TSM, the optimal values, n° pt and L1/L2 must be recalculated. Ta- 
ble [l] shows that n° t pt increases compared to using TSM alone, however, it is still much less than 
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Figure 2: The disconnected loop for T = 1 and 7375 as a function of the number of applications of kj 
applied to the propagator for K = 0. 166 and L = 200. Time partitioning has been used. 
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Table 2: The variation in the gains for Tr(TM ) as the quark mass is decreased. 



n c = 480. With these values increased gains are obtained for all Ts; most notably for 7375 an over- 
all gain of a factor of roughly 30 is obtained. These factors were calculated taking into account 
the cost of the applying the p; for example application of p A corresponds to 5% of the cost of a 
propagator inversion with n t = 66. It may be possible to increase the gain for T = 1, 7^ and cfy v 
by explicitly calculating the 4th and 6th order in the HPE. 

2.2 Effect of decreasing the quark mass 

The results presented so far have been for a quark mass slightly below the strange quark mass. 
If the quark mass is reduced further, table || shows that down to mps ~ 300 MeV there is no 
significant change in the values for the TSM method. As expected, the HPE technique becomes 
less effective as the quark mass decreases and this is reflected in the drop in the factors for the 
combined TSM+HPE. Nevertheless, at 300 MeV the gain is still > 2 times that for the TSM 
method alone for some of the Ts. 

2.3 Using a different solver 

The results in the previous sections were obtained using the conjugate gradiant (CG) algorithm 
in the solver. We are repeating the study using BiCGStab to see whether we can also achieve high 
gains with a more optimized solver. BiCGStab converges in less iterations than CG, for example, 
n c = 156 compared to 480 for CG at fc = 0.166. However, each iteration is more expensive. 
Furthermore, BiCGStab does not converge smoothly. This means we cannot calculate optimal 
values for n, and Z4/L2 (which depend on df2/dn t ). However, we can fix Z4/L2 ~ /1//2 by 
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requiring Var[Tr(rM^ 1 )] «Var[Tr(r(M,7 1 — M^ 1 ))] and vary n t to find the best gain. Initial results 
using n t = 14 give, for example, gains of 9 and 24 for T = 7375 using the TSM and TSM+HPE 
respectively, similar to the factors obtained using the CG solver. 

3. Summary 

The truncated solver method works well, providing gains in computer time of factors of 4 — 12 
for the disconnected loop, depending on the operator, for quark masses in the range of nips = 
600 — 300 MeV. The method is easy to implement, independent of the quark action and, as we have 
shown, can be combined with other methods like the HPE technique to obtain gains of factors of 
around 30 for some operators. Future work will include combining our method with the truncated 
eigenmode approach and a study of the size of the gauge noise. 
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